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FIELD OF THE INVENTION 

This invention relates to methods useful for disease diagnosis by detecting the 
presence of genetic mutations, including deletions, in cellular samples containing a 
small amount of mutated genetic material dispersed within a major amount of 
diagnostically-irrelevant (normal) genetic material. Methods of the invention are 
especially useful in the detection of genetic mutations characteristic of cancer. 
BACKGROUND OF THE INVENTION 

Cancer is a disease characterized by genomic instability. Generally, genomic 
instability defines a broad class of disruptions in genomic nucleotide sequences. Such 
disruptions include the loss of heterozygosity (usually characterized by massive loss of 
chromosomal DNA), microsatellite instability (usually indicative of defects in DNA repair 
mechanisms), and mutations (which include insertions, deletions, substitutions, 
duplications, rearrangements, or modifications). Numerous genomic instabilities have 
been associated with cancer. For example, mutations in a number of oncogenes and 
tumor suppressor genes have been implicated in tumorigenesis. Duffy, Clin. Chem., 
41: 1410-1413(1 993). In addition, the loss of heterozygosity at the P53 tumor 
suppressor locus has been correlated with various types of cancer. Ridanpaa, et a/., 
Path. Res. Pract, 191: 399^02(1995). The loss or other mutation of the ape and dec 
tumor suppressor genes has also been associated with tumor development. Blum, 
20 Europ. J. Cancer, 31 A: 1 369-372 (1 995). Finally, tumorigenesis has also been 
correlated with microsatellite instability. 

Genetic changes characteristic of genomic instability theoretically can serve as 
markers for the early stages of, for example, colon cancer, and can be detected in DNA 
isolated from biopsied colonic epithelium and in some cases from transformed cells 
25 shed into fecal material. Sidransky, et al. , Science, 256: 1 02-1 05 (1 992). 
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Detection methods proposed in the art are time-consuming and expensive. 
Duffy, supra. Moreover, methods according to the art cannot be used to identify a loss 
of heterozygosity or microsatellite instability in small subpopulation of cells when the 
cells exist in a heterogeneous (i.e., clonally impure) sample. For example, in U.S. 
Patent No. 5,527,676, it is stated that tissue samples in which a mutation is to be 
detected should be enriched for tumor cells in order to detect the loss of heterozygosity 
in a p53 gene. 

Techniques, such as PCR, have been used to detect a loss of heterozygosity 
resulting from massive deletions characteristic of late-stage adenomas. See, e.g., U.S. 
Patent No. 5,330,892. Such techniques generally require the use of large numbers of 
primer pairs, and they will not work at all in a heterogeneous sample. A recent 
publication reports the use of PCR and ELISA techniques to perform quantitative 
analysis of mutations in early-stage tumors. U.S. Patent No. 5,512,441. Identification 
of an abnormal subpopulation of cells in a heterogeneous sample of mostly normal 
cells and cellular debris, which subpopulation is characterized by loss of heterozygosity 
or microsatellite instability, is even more difficult because such detection involves the 
identification of a subpopulation of nucleotide fragments that are difficult to distinguish 
from the sea of heterogeneous normal cellular material in which they exist. A further 
problem is that mutation at any locus of a growing number of different oncogenes or 
tumor suppressor genes can result in cancer, and a screening approach capable of 
scanning all or even most loci in these genes is not currently available. 

Microsatellite instability may also be a marker for cancer. Microsatellites are 
dispersed throughout the genome at an average frequency of about 1 in 1 00,000 base 
pairs. They comprise tandem or trinucleotide repeats that normally are inherited in a 
stable fashion. See, e.g., Charlesworth, ef a/., Nature, 371: 215-220 (1994). While 
these sequences perform no known function in the genome, many of them have been 
mapped and have been used as markers based upon their sequence length 
polymorphisms. Clonal changes in microsatellite DNA associated with defects in 
mismatch repair pathways are thought to be suitable markers for hereditary non- 
polyposis colorectal cancer (HNPCC). In HNPCC tumor samples, for example, 
Microsatellites were found to have multiple insertions and/or deletions. Microsatellite 
instability may be an effective marker for failure of mismatch repair in oncogenes or in 
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tumor suppressor genes. While microsatellite instability itself is not indicative of 
cancer, it is evidence that mutations may occur in regions that are critical for onset of 
cancer. It follows that detection of instability in Microsatellites indicates that the patient 
is at risk of developing a clonal subpopulation of cancer cells. 

Colorectal cancer is a common cause of death in Western society. Any tumor or 
precancerous polyp that develops along the length of the colon or the rectum sheds 
cells or DNA from cells into the lumen of the colon. Shed cells or cellular DNA are 
usually incorporated on stool as stool passes through the colon. In the early stages of 
cancer, cancerous or precancerous cells represent a very small fraction of the shed 
epithelial cells or DNA in stool. Current methods for detection of colorectal cancer do 
not focus on detecting cancerous or precancerous cells in stool. Rather, such methods 
typically focus on extracellular indicia of the presence of cancer, such as the presence 
of fecal occult blood or carcinoembryonic antigen circulating in serum. 

It is known, however, that both sporadic and hereditary colorectal cancers result 
from mutations in oncogenes and tumor suppressor genes. Such mutations appear to 
occur at a point in the etiology of the disease that is much earlier than the point at 
which extracellular indicia or clinical signs of cancer are observed. If detected early, 
colon cancer may be effectively treated by surgical removal of the cancerous tissue. 
Surgical removal of early-stage colon cancer is usually successful because colon 
cancer begins in cells of the colonic epithelium and is isolated from the general 
circulation until the occurrence of invasion through the epithelial lining. Thus, detection 
of early mutations in colorectal cells would greatly increase survival rate. 

Current non-invasive methods for detection of colon cancer involve the detection 
of fecal occult blood and carcinoembryonic antigen. These screening methods often 
either fail to detect colorectal cancer or they detect colorectal cancer only after it has 
progressed to an untreatable stage. Moreover, carcinoembryonic antigen is thought 
not to be an effective predictor of cancer but merely an indicator of recurrent cancer. 

Invasive techniques, such as endoscopy, while effective, are expensive and 
painful and suffer from low patient compliance. Accordingly, current colon cancer 
screening methods are not practical for screening large segments of the population. 
See, e.g., Blum, Europ. J. Cancer, 31A: 1369-1372 (1995). 
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Therefore, there is a need in the art for simple and efficient non-invasive 
methods for reliable large-scale screening to identify individuals with early stage colon 
cancer. Such methods are provided herein. 

SUMMARY OF THE INVENTION 

The present invention provides methods for detecting a subpopulation of 
genomically transformed cells or cellular debris. Such methods detect the presence in 
a biological sample of a clonal subpopulation of cells which have a genome different 
from that of the wild type, and from bacterial, parasitic, or contaminating organisms that 
may also be present in the sample. Practice of the invention permits, for example, 
detection of a trace amount of DNA derived from cancer or precancer cells in a 
biological sample containing a majority of "normal" DNA or whole cells. A preferred use 
of the methods is to reliably detect in a stool sample voided by a patient the presence 
of a trace amount of cells and/or cellular debris containing DNA shed into the colon at 
the site of an asymptomatic precancerous or cancerous lesion. The invention takes 
advantage of several important insights which permit, for example, reliable detection of 
a DNA deletion at a known genomic site characteristic of a known cancer cell type. 

In general, the invention comprises the comparative measurement of two 
genomic sequences. One genomic sequence is stable through transformation, (i.e., it 
is identical in both malignant and wild type cells in the sample). A second genomic 
sequence typically undergoes change during the course of transformation (i.e., it is 
mutated during the development of malignant precursor cells). Hybridization probes 
are used to detect the presence of each genomic sequence. If the number of 
hybridization events involving the two genomic sequences is different, the difference 
may be due to insignificant background or it may be due to a statistically-significant 
difference in the quantities of the two genomic sequences in the population from which 
the sample was drawn. In the latter case, the difference can be correlated, to a degree 
of defined statistical confidence, with the presence in the sample of a subpopulation of 
cells having an altered (i.e., non-wild type) genomic sequence. 

The invention may be divided into three general embodiments. (1) In a first 
general embodiment, a quantitative amount (number of copies) of a gene or gene 
fragment of interest in a sample (i.e. a gene the mutation of which is known or 
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suspected to be associated with cancer) is compared to a quantitative amount of a 
reference gene or gene fragment in the sample, the reference gene being a gene 
which is not normally associated with cancer and which normally has a low rate of 
mutation. A statistically-significant difference between the two quantitative amounts is 
5 indicative of genomic instability in a cellular subpopulation in the sample. (2) In a 
second general embodiment of the invention, a quantitative amount of a region on a 
maternal allele is compared to a quantitative amount of the corresponding region on a 
paternal allele. A statistically-significant difference between the two quantitative 
amounts is indicative of genomic instability. (3) In a third general embodiment, the 
.. io number of microsatellite repeats at a particular locus are compared between maternal 
and paternal alleles. A statistically-significant difference in those numbers is indicative 
•*•**•* 0f an error in ,he mis match repair mechanisms in a subpopulation of cells in the sample 
* or may indicate that allelic loss has taken place. As stated above, errors in mismatch 

re P air ma V ™^ in mutations in tumor suppressor genes or in oncogenes. Cancer 
is detection in any of the three embodiments of the invention is achieved by measuring 
...... the number of hybridizations between at least two different nucleotide probes and their 

, respective genomic sequences. 

.y •. One feature of the invention is that it has now been recognized that materials 

from cells lining the colon (e.g., a polyp or lesion) are shed onto forming stool only in a 
.].. 20 region comprising a longitudinal stripe along the length of the stool. Thus, unless the 
: stool sample under investigation is a whole stool or comprises at least a cross-section 

of a stool, the sample will contain the relevant diagnostic information only by chance. 
The colon contains numerous bends and folds throughout its length. See. U.S. Patent " 
Number 5,741,650. Epithelial cells lining the colon normally migrate from a basal position 
25 in colonic crypts, where stem cells divide by mitosis, to the top of the crypts and are then 
shed into the lumen. Colonic epithelial cells that line the intestinal lumen typically undergo 
regeneration every four to five days as a result of the rapid turnover rate through the 
epithelium. Accordingly, sloughed epithelial cells or their DNA are constantly being 
jo deposited in the forming stool as it passes through the lumen. As the stool proceeds toward 
the rectum and becomes progressively more solid (from an initial liquid state), epithelial cells 
. v *** onl y sloughed onto the portion of the stool makina 
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contact with the portion of the lumen that formerly contained those cells in its epithelial 
lining. Epithelial cells of a polyp ( a polyp is a pre-cancerous growth; while not all 
polyps become cancerous, almost all cancers arise from polyps) undergo the same 
rapid life cycle and shedding described above for normal colonic epithelial cells. 
Accordingly, cells shed from polyps are typically only absorbed onto the surface of the 
forming stool that makes contact with the polyp. However, if the stool is in a liquid 
state, mixing of shed polyp cells throughout the stool occurs automatically. 

Accordingly, the present invention provides methods for detecting genomic 
changes in a subpopulation of cells in a sample of biological material. Methods of the 
invention are useful for the detection of changes in the nucleotide sequence of an 
allele in a small subpopulation of cells present in a large, heterogeneous sample of 
diagnostically-irrelevant biological material. Methods of the invention are useful for the 
detection and diagnosis of a genetic abnormality, such as a loss of heterozygosity or, 
more generally, a mutation, which can be correlated with a disease, such as cancer. 
For purposes of the present invention, unless the context requires otherwise, a 
"mutation" includes modifications, rearrangements, deletions, substitutions, and 
additions in a portion of genomic DNA or its corresponding mRNA. 

In a preferred embodiment, the invention provides methods for detecting a clonal 
subpopulation of transformed cells contained in, or suspected of being contained in, a 
biological sample obtained from an organism, such as a human. The methods 
comprise the steps of determining from the biological sample a number X of a first wild- 
type polynucleotide that is known or suspected not to be mutated in either wild-type 
cells or in transformed cells. A further step comprises determining from the biological 
sample a number Y of a second wild-type polynucleotide suspected of being mutated in 
a subpopulation of cells in the biological sample. Then, one determines whether a 
statistically-significant difference exists between X and Y. In a normal sample there is 
no mutation and so there is no statistically-significant difference between the number 
of each of the somatic genes in a normal cell. As a result, X and Y are not different 
from each other in a statistically significant sense. In contrast, the presence of a 
statistically significant difference between X and Y is indicative of the presence of a 
clonal subpopulation of transformed cells in the biological sample embodying a 
mutation. Statistical significance may be determined by any method known in the art. 
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However.no formal measurement of statistical significance need be performed in 
connection with any given assay. Rather, assays are designed to detect a large 
enough number of binding events such that at least a threshold difference between the 
numbers is dispositive of the issue of the presence of a mutant subpopulation of cells 
at any desired level of certainty. 

Also, in a preferred embodiment, transformed cells sought to be detected using 
methods according to the invention are malignant cells. Transformed cells detected 
according to methods of the invention may be induced transformants, transformed, for 
example, by a virus, by radiation, or by chemical or other carcinogenic means. 
Methods of the invention may be performed on any biological sample, including tissue 
and body fluid samples. Particularly preferred biological samples include pus, sputum, 
semen, blood, saliva, cerebrospinal fluid, and urine. In an important embodiment of the 
invention the sample is stool which is analyzed to detect colorectal cancer or 
precancer. Methods of the invention may be practiced by exposing the biological 
sample to one or more oligonucleotide probes in order to separately detect the number 
X of a first polynucleotide and the number Y of a second polynucleotide. Probes for 
use in the invention are detectably labeled. Preferred labels include fluorescent labels 
attached, for example, by affinity binding pairs (such as carbohydrate/lectin or 
avidin/biotin). Highly-preferred labels are microscopic particles which are counted by a 
detection apparatus, preferably a high-speed electronic apparatus as disclosed herein. 
The numbers X and Y are preferably proportional and most preferably equal to the 
number of target polynucleotide detection events occurring in the biological sample. 

Methods of the invention are especially useful for the detection of colorectal 
cancer or precancerous cells in humans. For purposes of the present invention, 
precancerous cells are cells that have a mutation that is associated with cancer, and 
which renders such cells susceptible to becoming cancerous. Such methods comprise 
determining whether cells or nucleotide debris in a stool sample include a deletion of a 
polynucleotide normally present in a wild-type genome of the human or other mammal. 
The sample may be exposed to a plurality of first and second oligonucleotide probes 
under hybridization conditions, thereby to hybridize (i) first probe to copies of a first 
polynucleotide segment characteristic of a wild-type genomic region known or 
suspected not to be deleted in cells of the sample and (ii) second probe to copies of a 
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second polynucleotide segment characteristic of the wild-type genomic region 
suspected of being mutated in the sample. The number of duplexes formed with each 
of the first and second probes is then detected and counted. The presence of a 
statistically-significant difference in those two numbers is indicative of the presence in 
the sample of a mutation that may be characteristic of colorectal cancer. Endoscopy or 
other visual examination procedures are then indicated. 

In a preferred embodiment, probes are labeled with beads or particles. In this 
embodiment, probes used for detection of genomic polynucleotide segments in a 
sample are preferably bound to such beads in a ratio of one probe to one bead, and 
the beads linked to the first and second probes are distinguishable, for example, by 
size. Use of such hybridization beads or particles facilitates the quantitative detection 
of genomic polynucleotide segments in the sample using, for example, an impedance 
counter, such as a "Coulter counter". 

Methods according to the invention also may be used to detect a loss of 
heterozygosity at an allele by determination of the amounts of maternal and paternal 
alleles comprising a genetic locus that includes at least one single-base polymorphism. 
A statistically-significant difference in the amounts of each allele is indicative of a 
mutation in an allelic region encompassing the single-base polymorphism. In this 
method, a region of an allele comprising a single-base polymorphism is identified, 
using, for example, a database, such as GenBank, or by other means known in the art. 
Probes are designed to hybridize to corresponding regions on both paternal and 
maternal alleles immediately 3' to the single base polymorphism as shown in Figure 3. 
After hybridization, a mixture of at least two of the four common dideoxy nucleotides 
are added to the sample, each labeled with a different detectable label. A DNA 
polymerase is also added. Using allelic DNA adjacent the polymorphic nucleotide as a 
template, hybridized probe is extended by the addition of a single dideoxynucleotide 
that is the binding partner for the polymorphic nucleotide. After washing to remove 
unincorporated dideoxynucleotides, the dideoxynucleotides which have been 
incorporated into the probe extension are detected by determining the number of bound 
extended probes bearing each of the two dideoxy nucleotides in, for example, a flow 
cytometer or impedance counter. The presence of an almost equal number of two 
different labels mean that there is normal heterozygosity at the polymorphic nucleotide. 
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The presence of a statistically-significant difference between the detected numbers of 
the two labels means that a deletion of the region encompassing the polymorphic 
nucleotide has occurred in one of the alleles. 

Methods of the invention may be used to determine whether a patient is a 
candidate for follow-up invasive diagnostic or other procedures, such as endoscopy. 
For example, methods of the invention may be used to detect a mutation in a tumor 
suppressor gene or an oncogene in a subpopulation of cells in a stool sample obtained 
from a patient. An endoscopy procedure may then be performed on patients diagnosed 
with a mutation. A positive endoscopy result is then followed by polypectomy, surgery, 
or other treatment to remove cancerous or precancerous tissue. 

Accordingly, it is an object of the invention to provide methods for detecting 
genomic instability in a subpopulation of cells in a cellular sample. It is a further object 
of the invention to provide methods for detecting a genomic change in a subpopulation 
of cells, wherein the genomic change is indicative of cancer. It is another object of the 
invention to detect a loss of heterozygosity in a genomic region associated with cancer, 
such as a tumor suppressor region. It is yet another object of the invention to provide 
methods for detecting heterozygosity and the loss thereof at single-base polymorphic 
nucleic acids. Finally, it is an object of the invention to provide methods for the 
detection of cancer, and particularly colorectal cancer by detection of cells or cellular 
debris indicative of cancer in a heteogeneous sample, such as a stool sample. 

Further aspects of the invention will become apparent upon consideration of the 
following detailed description and of the drawings. 

DESCRIPTION OF THE DRAWINGS 

Figure 1 is a flow chart showing sequential steps in methods of the invention. 

Figure 2 is a schematic diagram of a multi-orifice impedance counter of the type 
useful in accordance with the invention to count hybridization events; wherein 
reference numeral 1 indicates the direction of flow through the column; reference 
numeral 2 indicates a plunger means for forcing material downward in the column; 
reference numerals 3 and 4 are different-sized hybridization beads; reference numeral 
5 is an optional filter for extracting unwanted particles; reference numeral 6 indicates 
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an array of orifices for measuring differential impedance; and reference numeral 7 is a 
collection chamber. 

Figure 3 shows four possible probe attachment sites on allelic regions 
characterized by having a single base polymorphism. In Figure 3. sequence M1 is 
SEQ ID NO:1; sequence M2 is SEQ ID NO: 2; sequence M3 is SEQ ID NO:3; sequence 
M4 is SEQ ID NO: 4; sequence F1 is SEQ ID NO: 5; sequence F2 is SEQ ID NO: 6; 
sequence F3 is SEQ ID NO: 7; sequence F4 is SEQ ID NO: 8. 

Figures 4A and 4B are model Gaussian distributions showing regions of low 
statistical probability. 

Figure 5 is graph showing the probable values of N for a heterogeneous 
population of cells in which 1 % of the cells are mutated. 

DETAILED DESCRIPTION OF THE INVENTION 

Methods according to the present invention are useful for the detection of 
genomic instability in a heterogeneous cellular sample in which the genomic instability 
occurs in only a small subpopulation of cells in the sample. Using traditional detection 
methods, such a subpopulation would be difficult, if not impossible, to detect - 
especially if the mutation that is causative of genomic instability is unknown at the time 
of detection or a clonally-impure cellular population is used. See, e.g., U.S. Patent 
No. 5,527,676 (reporting that a clonal population of cells should be used in order to 
detect a deletion in a p53 gene). Traditional methods for detection of mutations 
involved in carcinogenesis rely upon the use of a clonally-pure population of cells and 
such methods are best at detecting mutations that occur at known "hot spots" in 
oncogenes, such as k-ras. See, Sidransky, supra. Using the PCR-based methods of 
the art, an extremely large number of primers would have to be designed and sample 
would have to be tested numerous times to detect genomic instability in a cellular 
sample that is clonally-impure (i.e., a heterogeneous sample such as stool) and in 
which the mutation to be detected is unknown and exists in a very small number of 
cells. Moreover, PCR is not useful for detection of the absence of a genetic sequence, 
as in the case of the present methods for detecting loss of heterozygosity. Even after 
such repeated testing, a PCR-based method may not detect a mutation in a small 
number of cells in a clonally-impure population if, for example, primers bordering the 
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site of the mutation are not used. Thus, in early-stage adenomas (when the population 
of mutated cells is very small), methods of the art are, at best, impractical and may not 
work at all. 

In contrast, methods of the present invention are capable of detecting genomic 
instability in a small number of cells in an impure cellular population because such 
methods do not rely upon knowing which mutation exists and such methods are not 
affected by the presence in the sample of heterogeneous DNA. For example, in loss of 
heterozygosity, deletions occur over large portions of the genomes and entire alleles 
may be missing (or at least enough of an allele may be missing in order to render the 
allele non-functional). Methods of the invention comprise counting a number of a gene 
suspected of being mutated and comparing that number with the number of a gene 
known not to be mutated in the same sample. All that one needs to know is at least a 
portion of the sequence of a wild-type gene in which the mutation is suspected to occur 
and at least a portion of the wild-type sequence qf a reference gene in which mutation 
is not suspected to have occurred. 

Accordingly, methods of the present invention are useful for the detection of 
changes in a genomic nucleotide sequence present in a subpopulation of cells or 
debris therefrom in a sample. Such changes generally occur as a mutation (i.e., a 
substitution, modification, deletion, addition, or rearrangement) in a wild-type allelic 
sequence in a subpopulation of cells. In the case of a tumor suppressor gene, the 
mutation typically takes the form of a massive deletion characteristic of loss of 
heterozygosity. Often, as in the case of certain forms of cancer, disease-causing 
mutations initially occur in a single cell which then produces a small subpopulation of 
mutant cells. By the time clinical manifestations of the mutation are detected, the 
disease may have progressed to an incurable stage. Methods of the invention allow 
detection of a mutation when the mutation exists as only a small percentage of the total 
cells or cellular debris in a sample. 

Methods of the invention comprise a comparison of two wild-type sequences that 
are expected to be present in the sample in equal numbers in normal (non-mutated) 
cells. In a preferred embodiment, the comparison is between (1 ) an amount of a 
genomic polynucleotide segment that is known or suspected not to be mutated in cells 
of the sample (the "reference") and (2) an amount of a wild-type (non-mutated) genomic 
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polynucleotide segment suspected of being mutated in a subpopulation of cells in the 
sample (ffie "target"). A statistically-significant difference between the amounts of the 
two genomic polynucleotide segments indicates that a mutation has occurred. 
Specifically, in the case of a deletion in a tumor suppressor gene, the detected amount 
of the reference gene is significantly greater than the detected amount of the target 
gene. If a target sequence is amplified, as in the case of certain oncogene mutations, 
the detected amount of target is greater than the detected amount of the reference 
gene by a statistically-significant margin. 

Methods according to the art generally require the use of numerous probes, 
usually in the form of PCR primers and/or hybridization probes, in order to detect a 
deletion or a point mutation. However, because methods of the present invention 
involve quantitative detection of nucleotide sequences and quantitative comparisons 
between sequences that are known to be stable and those that are suspected of being 
unstable, only a few probes must be used in order to accurately assess cancer risk. In 
fact, a single set (pair) of probes is all that is necessary. The risk of cancer is indicated 
by the presence of a mutation in a genetic region known or suspected to be involved in 
oncogenesis. Patients who are identified as being at risk based upon tests conducted 
according to methods of the invention are then directed to other, typically invasive, 
procedures for confirmation and/or treatment of the disease. 

Quantitative sampling of a nucleotide sequence that is uniformly distributed in a 
biological sample typically follows a Poisson distribution. For large populations, such 
as the typical number of genomic polynucleotide segments in a biological sample, the 
Poisson distribution is similar to a normal (Gaussian) curve with a mean, N, and a 
standard deviation that may be approximated as the square root of N. 

Statistically-significance between numbers of target and reference genes 
obtained from a biological sample may be determined by any appropriate method. 
See, e.g., Steel, et a/., Principles and Procedures of Statistics, A Biometrical Approach 
(McGraw-Hill, 1980), the disclosure of which is incorporated by reference herein. An 
exemplary method is to determine, based upon a desired level of specificity (tolerance 
of false positives) and sensitivity (tolerance of false negatives) and within a selected 
level of confidence, the difference between numbers of target and reference genes that 
must be obtained in order to reach a chosen level of statistical significance. A 
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threshold issue in such a determination is the minimum number, N, of genes (for each 
of target and reference) that must be available in a population in order to allow a 
determination of statistical significance. The number N will depend upon the 
assumption of a minimum number of mutant alleles in a sample containing mutant 
alleles (assumed herein to be at least 1 %) and the further assumption that normal 
samples contain no mutant alleles. It is also assumed that a threshold differences 
between the numbers of reference and target genes must be at least 0.5% for a 
diagnosis that there is a mutation present in a subpopulation of cells in the sample. 
Based upon the foregoing assumptions, it is possible to determine how large N must be 
so that a detected difference between numbers of mutant and reference alleles of less 
than 0.5% is truly a negative (i.e. no mutant subpopulation in the sample) result 99.9% 



of the time. 



The calculation of N for specificity, then, is based upon the probability of one 
sample measurement being in the portion of the Gaussian distribution covering the 
lowest 3.16% of the population (the area marked "A" in figure 4A) and the probability 
that the other sample measurement is in the portion of the Gaussian distribution 
covering the highest 3.16% of the population (the area marked "B" in figure 4B). Since 
the two sample measurements are independent events, the probability of both events 
occurring simultaneously is approximately 0.001 or 0.1 %. Thus, 93.68% of the 
Gaussian distribution (100% - 2x3.16%) lies between the areas marked A and B in 
figure 5A. Statistical tables indicate that such area is equivalent to 3.72 standard 
deviations. Accordingly, 0.5%N equals 3.72 sigma. Since sigma (the standard 
deviation) is equal to JW, the equation may be solved for N as 553,536. This means 
that if the lower of the two numbers representing reference and target is at least 
553,536 and if the patient is normal, the difference between the numbers will be less 
than 0.5% about 99.9% of the time. 

To determine the minimum N required for 99% sensitivity a similar analysis is 
performed. This time, one-tailed Gaussian distribution tables show that 1.28 standard 
deviations (sigma) from the mean cover 90% of the Gaussian distribution. Moreover, 
there is a 10% (the square root of 1%) probability of one of the numbers (reference or 
target) being in either the area marked "A' in figure 5 or in the area marked "B" in figure 
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5. If the too population means are a total of 1% different and if there must be a 0.5% 
difference between the number of target and reference genes, then the distance from 
either mean to the threshold for statistical significance is equivalent to 0.25%N (See 
Figure 5) for 99% sensitivity. As shown in Figure 5, 0.25%N corresponds to about 
40% of one side of the Gaussian distribution. One-tailed statistical tables reveal that 
40% of the Gaussian distribution corresponds to 1 .28 standard deviations. Therefore, 
1 .28 sigma is equal to 0.0025N, and N equals 262,144. Thus, for abnormal samples, ' 
the difference will exceed 0.5% at least 99% of the time if the lower of the two numbers 
is at least 262,144. Conversely, an erroneous negative diagnosis will be made only 1% 
of the time under these conditions. 

In order to have both 99.9% specificity (avoidance of false positives) and 99% 
sensitivity (avoidance of false negatives), a sample with at least 553,536 (or roughly 
greater than 550,000) of both target and reference alleles should be used. A difference 
of at least 0.5% between the numbers obtained is significant at a confidence level of 
99.0% for sensitivity and a difference of less than 0.5% between the numbers is 
significant at a confidence level of 99.9% for specificity. As noted above, other 
standard statistical tests may be used in order to determine statistical significance and 
the foregoing represents one such test 

Based upon the foregoing explanation, the skilled artisan appreciates that 
methods of the invention are useful to detect mutations in a subpopulation of a 
polynucleotides in any biological sample. For example, methods disclosed herein may 
be used to detect allelic loss (the loss of heterozygosity) associated with diseases such 
as cancer. Additionally, methods of the invention may be used to detect a deletion or a 
base substitution mutation causative of a metabolic error, such as complete or partial 
loss of enzyme activity. For purposes of exemplification, the following provides details 
of the use of methods according to the present invention in colon cancer detection. 
Inventive methods are especially useful in the early detection of a mutation (and 
especially a large deletion typical of loss of heterozygosity) in a tumor suppressor 
gene. Accordingly, while exemplified in the following manner, the invention is not so 
limited and the skilled artisan will appreciate its wide range of applicability upon 
consideration thereof. 
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Mejhods according to the invention preferably comprise one of three types of 
detection regimens. In a first preferred detection regimen, an amount of a 
polynucleotide known or suspected to be mutated is compared to an amount of a 
reference polynucleotide known or suspected not to be mutated. In a second 
preferred detection regimen, an amount of a polymorphic nucleotide on a maternal 
allele is compared to an amount of the corresponding polymorphic nucleotide on the 
corresponding paternal allele. Finally, a third detection regimen comprises a 
comparison of a microsatellite repeat region in a normal allele with the corresponding 
microsatellite region in an allele known or suspected to be mutated. All three 
exemplary detection means comprise determining whether a difference exists between 
the amounts of each nucleic acid being measured. The presence of a statistically- 
significant difference is indicative that a mutation has occurred in one of the nucleic 
acids being measured. Thus, methods described below are generally applicable to all 
forms of the invention, the variations of which are shown in figure 1 . 

is I. Preparation of a Stool Sample 

A sample prepared from stool voided by a patient should comprise at least a 
cross-section of the voided stool. As noted above, stool is not homogenous with 
respect to sloughed cells. As stool passes through the colon, it absorbs sloughed cells 
from regions of the colonic epithelium with which it makes contacts. Thus, sloughed 
cells from a polyp are absorbed on only one surface of the forming stool (except near 
the cecum where stool is still liquid and is homogenized by Intestinal Peristalsis). 
Taking a representative sample of stool (i.e., at least a cross-section) and 
homogenizing it ensures that sloughed cells from all epithelial surfaces of the colon will 
be present for analysis in the processed stool sample. Stool is voided into a receptacle 
that is preferably small enough to be transported to a testing facility. The receptacle 
may be fitted to a conventional toilet such that the receptacle accepts stool voided in a 
conventional manner. The receptacle may comprise a mesh or a screen of sufficient 
size and placement such that stool is retained while urine is allowed to pass through 
the mesh or screen and into the toilet. The receptacle may additionally comprise 
means for homogenizing voided stool. Moreover, the receptacle may comprise means 
for introducing homogenization buffer or one or more preservatives, such as alcohol or 
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a high salt concentration solution, in order to neutralize bacteria present in the stool 
sample and to inhibit degradation of DNA. 

The receptacle, whether adapted to fit a toilet or simply adapted for receiving the 
voided stool sample, preferably has sealing means sufficient to contain the voided stool 
sample and any solution added thereto and to prevent the emanation of odors. The 
receptacle may have a support frame which is placed directly over a toilet bowl. The 
support frame has attached thereto an articulating cover which may be placed in a 
raised position, for depositing of sample or a closed position (not shown) for sealing 
voided stool within the receptacle. The support frame additionally has a central 
opening traversing from a top surface through to a bottom surface of the support frame. 
The bottom surface directly communicates with a top surface of the toilet. Extending 
from the bottom surface of the support frame and encompassing the entire 
circumference of the central opening is a means for capturing voided stool. The means 
for capturing voided stool may be fixedly attached to the support frame or may be 
removably attached for removal subsequent to deposition of stool. 

Once obtained, the stool sample is homogenized in an appropriate buffer, such 
as phosphate buffered saline or a chaotropic salt solution. Homogenization means and 
materials for homogenization are generally known in the art. See, e.g., U.S. Patent No. 
4,101,279. Thus, particular homogenization methods may be selected by the skilled 
artisan. Methods for further processing and analysis of a biological sample, such as a 
stool sample are presented below. 

II. Methods for Detection of Colon Cancer or Precancer 

A. Reference-Target 

For exemplification, methods of the invention are used to detect a deletion or 
other mutation in the p53 tumor suppressor gene in cells obtained from a 
representative stool sample. The p53 gene is a good choice because the loss of 
heterozygosity in p53 is often associated with colorectal cancer. An mRNA sequence 
corresponding to the DNA coding region for p53 is reported as GenBank Accession No. 
M92424. The skilled artisan understands that methods described herein may be used 
to detect mutations in any gene and that detection of a p53 deletion is exemplary of 
such methods. At least a cross-section of a voided stool sample is obtained and 
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prepared as described immediately above. DNA or RNA may optionally be isolated 
from the sample according to methods known in the art. See, Smith-Ravin, et a/., Gut, 
36: 81-86 (1995), incorporated by reference herein. However, methods of the 
invention may be performed on unprocessed stool. 

Nucleic acids may be sheared or cut into small fragments by, for example, 
restriction digestion. The size of nucleic acid fragments produced is not critical, subject 
to the limitations described below. A target allele that is suspected of being mutated 
(p53 in this example) and a reference allele are chosen. A reference allele may be any 
allele known or suspected not to be mutated in colon cancer cells. Single-stranded 
nucleic acid fragments may be prepared using well-known methods. See, e.g., 
Sambrook, et a/., Molecular Cloning, A Laboratory Manual (1989) incorporated by 
reference herein. 

Either portions of a coding strand or its complement may be detected in methods 
according to the invention. For exemplification, detection of the coding strand of p53 
and reference allele are described. Complement to both p53 and reference allele are 
removed by hybridization to anti-complement oligonucleotide probes (isolation probes) 
and subsequent removal of duplex formed thereby. Methods for removal of 
complement strands from a mixture of single-stranded oligonucleotides are known in 
the art and include techniques such as affinity chromatography. Upon converting 
double-stranded DNA to single-stranded DNA, sample is passed through an affinity 
column comprising bound isolation probe that is complementary to the sequence to be 
isolated away from the sample. Conventional column chromatography is appropriate 
for isolation of complement. An affinity column packed with sepharose or any other 
appropriate materials with attached complementary nucleotides may be used to isolate 
complement DNA in the column, while allowing DNA to be analyzed to pass through the 
column. See Sambrook, Supra. As an alternative, isolation beads may be used to 
exclude complement as discussed in detail below. 

After removal of complement strands, first oligonucleotide probes which 
hybridize to at least a portion of the p53 allele and second oligonucleotide probes that 
hybridize to at least a portion of the reference allele are obtained. The probes are 
labeled with a detectable label, such as fluorescein or detectable particles. Distinct 
labels for the probes are preferred. 
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, Labeled probes are then exposed to sample under hybridization conditions. 
Such conditions are well-known in the art. See, e.g., Wallace, et al., Nucleic Acids 
Res.. 6:3543-3557 (1979), incorporated by reference herein. First and Second 
oligonucleotide probes that are distinctly labeled (i.e. with different radioactive 
isotopes, fluorescent means, or with beads of different size, See infra) are applied to a 
single aliquot of sample. After exposure of the probes to sample under hybridization 
conditions, sample is washed to remove any uhhybridized probe. Thereafter, 
hybridized probes are detected separately for p53 hybrids and reference allele hybrids. 
Standards may be used to establish background and to equilibrate results. Also, if 
differential fluorescent labels are used, the number of probes may be determined by 
counting differential fluorescent events in a sample that has been diluted sufficiently to 
enable detection of single fluorescent events in the sample. Duplicate samples may be 
analyzed in order to confirm the accuracy of results obtained. 

If there is a statistically-significant difference between the amount of p53 
detected and the amount of the reference allele detected, it may be assumed that a 
mutation has occurred in P 53 and the patient is at risk for developing or has developed 

colon cancer. Statistical significance may be determined by any known method. A 
preferred method is outlined above. 

The determination of a p53 mutation allows a clinician to recommend further 
treatment, such as endoscopy procedures, in order to further diagnose and, if 
necessary, treat the patient's condition. The following examples illustrate methods of 
the invention that allow direct quantification of hybridization events. 

1 Method for Increased Quantitation of Target and Reference 
Polynucleotides 

Enhanced quantification of binding events between hybridization probes and 
target or reference is accomplished by coupling hybridization probes to particles, such 
as beads (hybridization beads). 

In order to obtain a precise quantitative measure of the amount of a 
polynucleotide in a sample, hybridization beads are constructed prior to conducting 
hybridizations, such that each bead has attached thereto a single oligonucleotide 
probe. 
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_ - a; Method for Preparation of Probe-Bead Combinations 
A single probe is attached to a bead by incubating a large excess of 
hybridization beads with oligonucleotide probes of a given type (i.e., either first or 
second oligonucleotide probes). Coupling of probe to bead is accomplished using an 
affinity-binding pair. For example, beads may be coated with avidin or streptavidin and 
probes may be labeled with biotin to effect attachment of the probe to the bead. The 
mixture of beads and probes is agitated such that virtually 100% of the probes are 
bound to beads. The mixture is then exposed to a matrix, such as an affinity column or 
a membrane coated with oligonucleotides that are complementary to the probe. Only 
beads that have an attached probe will adhere to the matrix, the rest being washed 
away. Beads with coupled probe are then released from the matrix by melting 
hybridizations between probe and complement. Multiple exposures to the matrix and 
pre-washing of the column reduces non-specific binding. Moreover, naked beads (i.e., 
without attached probe) may be exposed to the matrix to determine a background 
number of beads that can be expected to attach to the matrix in the absence of probe. 

By using a vast excess of beads relative to probe as described above, it is 
expected that the vast majority of recovered beads will have only one attached probe. 
For example, if a mixture has a ratio of 1 probe to 1000 beads, it is expected that only 
about 1 bead in a million will have two attached probes and even less than one bead in 
20 a million will have more than two attached probes. Accordingly, hybridization beads 
are provided in an effective 1:1 ratio with probe which allows for precise quantification 
of target and reference polynucleotide as described below. 

For each assay described below, two distinct hybridization beads are used. A 
first hybridization bead has attached thereto a single first oligonucleotide probe that is 
complementary to at least a portion of a target polynucleotide (e.g., a p53 allele). A 
second hybridization bead, of a size distinct from the first hybridization bead, has 
attached thereto a single second oligonucleotide probe that is complementary to at 
least a portion of a reference polynucleotide (i.e., one that is known or suspected not 
to be mutated in the sample). 
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^ - b: Use of Beads to Quantify Target and Reference 

Polynucleotides 

DNA is melted (denatured to form single-stranded DNA) by well-known methods 
See, e.g., Gyllensten, etal., in Recombinant DNA Methodology II, 565-578 (Wu, ed., 
1995), incorporated by reference herein. According to methods of the invention, one 
may detect either a coding strand or its complement in order to quantify target and/or 
reference polynucleotide. For purposes of illustration, the present example assumes 
detection of the coding strand. 

2. Removal of Complement 
Single-stranded complement of the target polynucleotide (e.g., p53) and 
reference polynucleotide are removed from the sample by binding to oligonucleotide 
probes that are complementary to target or reference complement. Such probes, 
referred to herein as isolation probes, are attached to isolation beads prior to their 
introduction into the sample. The beads may be magnetized. Thus, when magnetized 
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ABSTRACT 



The present invention provides methods for screening for the presence of a subpopulation of 
cancerous or precancerous cells in a heterogenous cellular sample, such as a stool sample. 
The methods take advantage of the recognition that cellular debris from cancerous and 
precancerous cells is deposited onto only a longitudinal stripe of stool as the stool if forming 
in the colon. Accordingly, methods of the invention comprise obtaining a representative 
sample, such as a cross-sectional sample of stool in order to ensure that any cellular debris 
that is shed by colonic cells is obtained in the sample. 



